## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'kableExtra'
We load two data files: air_travel_MB.csv contains the
network description and airport_codes_CAN.csv, which has
some information about all airfields in Canada.
The data in air_travel_MB is pre-processed. It represents an extreme simplification of a much more complete dataset giving, for a given time period, the total number of travellers between 3734 airports worldwide along trips including up to 5 intermediate stops.
What I did is to keep only the airports in Manitoba. Any airport
outside of Manitoba is set as RoW (rest of the world).
Also, the existing multi-leg trips within Manitoba are “decomposed”: if
the data had 10 people flying, say, YWG\(\to\)YQD\(\to\)YTH (Winnipeg to The Pas to Thompson),
these would appear as 10 people flying from YWG to YQD and 10 people
flying from YQD to YTH.
Flights to and from RoW are decomposed the same way as flights within Manitoba, increasing the number of trips between RoW and YWG. Indeed, 5 passengers flying RoW\(\to\)YWG\(\to\)YQD and 5 passengers flying RoW\(\to\)YWG\(\to\)YBR (Brandon) will show as 10 passengers flying RoW to YWG, together with 5 passengers flying YWG\(\to\)YQD and 5 passengers flying YWG\(\to\)YBR.
First, here is what the flight data looks like. (I am using
knitr::kable to make the table more readable.)
| orig | dest | vol |
|---|---|---|
| RoW | YBR | 2002 |
| RoW | YTH | 18 |
| RoW | YWG | 157323 |
| RoW | YYQ | 1474 |
| XLB | RoW | 766 |
| XLB | XTL | 114 |
The airport data, filtered to only retain MB locations, looks as follows.
| Airport_ID | Identifier | Type | Name | Latitude | Longitude | Elevation_Ft | Continent | ISO_Country_Code | ISO_Region | Municipality | Is_Scheduled_Service | GPS_Code | IATA_Code | Local_Code | Home_Link | Wikipedia_Link | Keywords |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 35391 | CA-0006 | closed | RCAF Station Carberry | 49.87210 | -99.39730 | NA | NA | CA | CA-MB | Carberry | FALSE | NA | NA | NA | NA | https://en.wikipedia.org/wiki/RCAF_Station_Carberry | NA |
| 39668 | CA-0022 | closed | Anama Bay-Dauphin River Airport | 51.96267 | -98.13684 | NA | NA | CA | CA-MB | NA | FALSE | NA | NA | NA | NA | NA | NA |
| 39677 | CA-0031 | closed | Austin Airport | 49.93333 | -98.91666 | NA | NA | CA | CA-MB | NA | FALSE | NA | NA | NA | NA | NA | NA |
| 39686 | CA-0040 | closed | Beausejour Airport | 50.13697 | -96.20822 | NA | NA | CA | CA-MB | Beausejour | FALSE | NA | NA | NA | NA | NA | NA |
| 39694 | CA-0048 | closed | Bethany Airport | 50.35000 | -99.75000 | NA | NA | CA | CA-MB | NA | FALSE | NA | NA | NA | NA | NA | NA |
| 39696 | CA-0050 | closed | Bird Airport | 56.50000 | -94.21667 | NA | NA | CA | CA-MB | NA | FALSE | NA | NA | NA | NA | NA | NA |
Note that there is an alternative to dplyr (the package
that allows us to use the pipe %>%); it is indeed
possible to treat data frames as SQL tables and run SQL queries on them.
This may make more sense to some of you. For illustration, to create the
table of movements, I used the following command (which can run, since
it is not destructive of what we have already done).
query = "SELECT orig,dest,SUM(vol) AS vol
FROM air_travel_MB
GROUP BY orig,dest
ORDER BY orig,dest"
air_travel_MB = sqldf::sqldf(query)
Finish preparing the data: the following code (chunk not shown) adds
the latitude and longitudes of all the airports in the
air_travel_MB data set. Note that we give RoW
a location: we will show it as an additional vertex in the graph. (I
picked somewhere in Ontario, close to but not in Manitoba.)
| IATA | lon | lat |
|---|---|---|
| RoW | -90.0000 | 49.5000 |
| XLB | -101.4690 | 58.6175 |
| XSI | -98.9072 | 56.7928 |
| XTL | -98.5122 | 58.7061 |
| YBR | -99.9519 | 49.9100 |
| YBT | -101.6790 | 57.8894 |
We are now in a position to make the graph and plot it in simple form. We add the volume as the weight of the arcs.
We also make a subgraph without RoW to focus on the
properties really specific to Manitoba.
Before we explore the properties of the graph, let us spend a little time on visualisation. This is not to be neglected: nice plots help more than you think.
The next piece of code does the following: - Redefine the output size
(does it once for all, need to modify if need be) - Uses the library
raster to read in the shape file for Canada - Selects
Manitoba from the Canada shape file - Plots Manitoba - Plots the
graph
Let us add the degree information as the size of the vertices.
Now let us take a look at the arcs. Maybe change their thickness to reflect the volumes along the arcs?
My guess: some of the arcs are so massive that this is all we can see. So let us renormalise the values of the weights. First, what do they look like now?
## [1] 18 47 48 48 50 98 99 114 116 174
## [11] 176 208 208 212 242 276 278 279 288 288
## [21] 288 292 292 309 337 337 343 344 364 366
## [31] 368 369 372 383 399 401 401 406 447 447
## [41] 449 451 451 462 529 565 567 582 592 644
## [51] 650 652 652 682 684 694 696 701 702 713
## [61] 714 758 766 780 806 848 854 1036 1050 1167
## [71] 1172 1242 1270 1279 1342 1349 1394 1462 1464 1474
## [81] 1485 1543 1565 1607 1638 1750 1857 2002 2230 2518
## [91] 2582 2877 2899 2910 2924 3037 3157 3355 3355 3365
## [101] 4246 4571 5770 5789 7430 11464 157323 183256
So, yes, clearly a bit too much variation. If we scale linearly by multiplying by a constant, we will most likely lose the information on the small weights. Just to check that we are intuiting correctly, though, let us try once doing this.
What we suspected, indeed: to make the arcs with the largest volume not too large, we had to essentially get rid of the weight of the arcs with small volumes. So what we want to do is to process the values through a function that would make the smaller values larger relative to the larger ones or the larger values closer to the smaller one, then scale the result. For this, it is good to look at what the values look like.
One way around this could be to use a so-called sigmoid function.
This is a function of the following form: \[
f(x) = \dfrac{1}{1+e^{-cx}},
\] where \(\mathbb{R}\ni
c>0\) is a parameter. This function has the following
properties: \(\lim_{x\to-\infty}f(x)=0\) and \(\lim_{x\to\infty}f(x)=1\). By playing with
the value of \(c\), one can set the
beahviour of \(f\). Let us see how
different values of \(c\) influence the
value of the points. (The viridis palette used starts with
yellow and goes towards green and then purple.)
That last one looks quite nice. We most likely will have to multiply by a factor, so we compute the values in advance.
Well, after all this, I am not convinced we achieved between than we had done above but just a linear scaling. So when we want a a good plot, we’ll come back to that solution.
For simplicity, I only show the out-degree case. And just to show how easily it’s done: let’s colour the vertices in the centre red and those in the periphery blue.
## Central points:
## [1] "YFO" "YGO" "YGX" "YOH" "YQD" "YTH" "YWG" "YYQ" "ZGI" "ZTM"
## Periphery:
## [1] "XLB" "XSI" "XTL" "YBR" "YBT" "YBV" "YCR" "YDN" "YIV" "YNE" "YRS" "YST"
## [13] "ZAC"
Recall that girth in igraph assumes the
graph is undirected.
## $girth
## [1] 3
##
## $circle
## + 3/24 vertices, named, from 3c01b07:
## [1] XLB RoW XTL
## $girth
## [1] 3
##
## $circle
## + 3/23 vertices, named, from 5e4decc:
## [1] XTL XLB YTH
What fraction of possible arcs/edges are present?
## [1] 0.1956522
## [1] 0.1600791
## Warning in largest_cliques(G_MB): At
## vendor/cigraph/src/cliques/maximal_cliques_template.h:219 : Edge directions are
## ignored for maximal clique calculation.
## [[1]]
## + 5/23 vertices, named, from 5e4decc:
## [1] ZGI YGO YWG YTH YOH
## XLB XSI XTL YBR YBT YBV YCR YDN YFO YGO YGX YIV YNE YOH YQD YRS YST YTH YWG YYQ
## 2 1 2 2 1 1 2 2 3 4 2 2 2 4 3 4 2 14 17 2
## ZAC ZGI ZTM
## 1 5 3
## XLB XSI XTL YBR YBT YBV YCR YDN YFO YGO YGX YIV YNE YOH YQD YRS YST YTH YWG YYQ
## 2 1 2 2 1 1 2 2 3 4 2 3 2 4 3 4 1 14 17 2
## ZAC ZGI ZTM
## 1 5 3
## XLB XSI XTL YBR YBT YBV YCR YDN YFO YGO YGX YIV YNE YOH YQD YRS YST YTH YWG YYQ
## 4 2 4 4 2 2 4 4 6 8 4 5 4 8 6 8 3 28 34 4
## ZAC ZGI ZTM
## 2 10 6
For info, the return values of the function knn are -
knn: A numeric vector giving the average nearest neighbor
degree for all vertices in vids. - knnk: A numeric vector,
its length is the maximum (total) vertex degree in the graph. The first
element is the average nearest neighbor degree of vertices with degree
one, etc.
## $knn
## XLB XSI XTL YBR YBT YBV YCR YDN
## 16.000000 28.000000 16.000000 19.000000 28.000000 34.000000 19.000000 19.000000
## YFO YGO YGX YIV YNE YOH YQD YRS
## 22.666667 20.000000 31.000000 17.400000 19.000000 20.000000 22.666667 13.750000
## YST YTH YWG YYQ ZAC ZGI ZTM
## 24.333333 7.142857 6.705882 31.000000 28.000000 17.200000 23.333333
##
## $knnk
## [1] NaN 29.500000 24.333333 21.250000 17.400000 22.888889 NaN
## [8] 17.916667 NaN 17.200000 NaN NaN NaN NaN
## [15] NaN NaN NaN NaN NaN NaN NaN
## [22] NaN NaN NaN NaN NaN NaN 7.142857
## [29] NaN NaN NaN NaN NaN 6.705882
## $knn
## XLB XSI XTL YBR YBT YBV YCR YDN
## 16.000000 28.000000 16.000000 19.000000 28.000000 34.000000 19.000000 19.000000
## YFO YGO YGX YIV YNE YOH YQD YRS
## 22.666667 20.000000 31.000000 15.000000 19.000000 20.000000 22.666667 13.750000
## YST YTH YWG YYQ ZAC ZGI ZTM
## 34.000000 7.142857 6.705882 31.000000 28.000000 17.200000 23.333333
##
## $knnk
## [1] 30.400000 21.250000 20.916667 17.916667 17.200000 NaN NaN
## [8] NaN NaN NaN NaN NaN NaN 7.142857
## [15] NaN NaN 6.705882
## $knn
## XLB XSI XTL YBR YBT YBV YCR YDN
## 16.000000 28.000000 16.000000 19.000000 28.000000 34.000000 19.000000 19.000000
## YFO YGO YGX YIV YNE YOH YQD YRS
## 22.666667 20.000000 31.000000 21.000000 19.000000 20.000000 22.666667 13.750000
## YST YTH YWG YYQ ZAC ZGI ZTM
## 19.500000 7.142857 6.705882 31.000000 28.000000 17.200000 23.333333
##
## $knnk
## [1] 29.500000 21.050000 22.888889 17.916667 17.200000 NaN NaN
## [8] NaN NaN NaN NaN NaN NaN 7.142857
## [15] NaN NaN 6.705882
Let us plot the values in the graph, say, for the in-knn.
## XLB XSI XTL YBR YBT YBV YCR YDN YFO YGO YGX YIV YNE YOH YQD YRS YST YTH YWG YYQ
## 4 2 4 4 2 2 4 4 6 8 4 4 4 8 6 6 3 8 8 4
## ZAC ZGI ZTM
## 2 8 6
## XLB XSI XTL YBR YBT YBV YCR YDN YFO YGO YGX YIV YNE YOH YQD YRS YST YTH YWG YYQ
## 2 1 2 2 1 1 2 2 3 4 2 2 2 4 3 3 1 4 4 2
## ZAC ZGI ZTM
## 1 4 3
## XLB XSI XTL YBR YBT YBV YCR YDN YFO YGO YGX YIV YNE YOH YQD YRS YST YTH YWG YYQ
## 2 1 2 2 1 1 2 2 3 4 2 2 2 4 3 3 2 4 4 2
## ZAC ZGI ZTM
## 1 4 3
Let us look at the out-coreness, for instance, and colour the vertices as a function of that.
## RoW XLB XSI XTL YBR YBT YBV
## 63.416667 0.000000 0.000000 0.000000 2.166667 0.000000 0.000000
## YCR YDN YFO YGO YGX YIV YNE
## 0.000000 0.000000 0.000000 0.000000 0.000000 0.500000 0.000000
## YOH YQD YRS YST YTH YWG YYQ
## 0.000000 0.000000 2.666667 0.000000 191.916667 250.166667 0.000000
## ZAC ZGI ZTM
## 0.000000 5.083333 3.083333
## XLB XSI XTL YBR YBT YBV YCR
## 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## YDN YFO YGO YGX YIV YNE YOH
## 0.000000 0.000000 0.000000 0.000000 0.500000 0.000000 0.000000
## YQD YRS YST YTH YWG YYQ ZAC
## 0.000000 2.666667 0.000000 211.666667 280.166667 0.000000 0.000000
## ZGI ZTM
## 6.000000 4.000000
First, the unweighted case, i.e., using the geodesic distance.
## XLB XSI XTL YBR YBT YBV YCR YDN YFO YGO
## 0.02000 0.01961 0.02000 0.02128 0.01961 0.02083 0.02128 0.02128 0.02439 0.02500
## YGX YIV YNE YOH YQD YRS YST YTH YWG YYQ
## 0.02381 0.02174 0.02128 0.02500 0.02439 0.02222 0.02128 0.03333 0.03704 0.02381
## ZAC ZGI ZTM
## 0.01961 0.02564 0.02439
## XLB XSI XTL YBR YBT YBV YCR YDN YFO YGO
## 0.02000 0.01961 0.02000 0.02128 0.01961 0.02083 0.02128 0.02128 0.02439 0.02500
## YGX YIV YNE YOH YQD YRS YST YTH YWG YYQ
## 0.02381 0.02174 0.02128 0.02500 0.02439 0.02222 0.02083 0.03333 0.03704 0.02381
## ZAC ZGI ZTM
## 0.01961 0.02564 0.02439
## XLB XSI XTL YBR YBT YBV YCR YDN YFO YGO
## 0.02000 0.01961 0.02000 0.02128 0.01961 0.02083 0.02128 0.02128 0.02439 0.02500
## YGX YIV YNE YOH YQD YRS YST YTH YWG YYQ
## 0.02381 0.02128 0.02128 0.02500 0.02439 0.02222 0.02128 0.03333 0.03704 0.02381
## ZAC ZGI ZTM
## 0.01961 0.02564 0.02439
We can also compute closeness using weighted arcs, i.e., using the travel volumes. Geodesic distance shown for comparison. We save the results for future manipulation.
## XLB XSI XTL YBR YBT YBV YCR
## 0.02000000 0.01960784 0.02000000 0.02127660 0.01960784 0.02083333 0.02127660
## YDN YFO YGO YGX YIV YNE YOH
## 0.02127660 0.02439024 0.02500000 0.02380952 0.02127660 0.02127660 0.02500000
## YQD YRS YST YTH YWG YYQ ZAC
## 0.02439024 0.02222222 0.02127660 0.03333333 0.03703704 0.02380952 0.01960784
## ZGI ZTM
## 0.02564103 0.02439024
## XLB XSI XTL YBR YBT YBV
## 3.556061e-05 4.171882e-05 3.788453e-05 4.025603e-05 3.989150e-05 2.928172e-05
## YCR YDN YFO YGO YGX YIV
## 3.762935e-05 3.726893e-05 3.605163e-05 5.860634e-05 2.420721e-05 3.335557e-05
## YNE YOH YQD YRS YST YTH
## 2.986323e-05 6.466632e-05 4.283206e-05 5.351887e-05 2.247494e-05 6.413134e-05
## YWG YYQ ZAC ZGI ZTM
## 5.218934e-05 4.675738e-05 4.381353e-05 6.568576e-05 5.549390e-05
To prepare for plots, let us look at the ranges of values.
## [1] 0.01960784 0.03703704
## [1] 2.247494e-05 6.568576e-05
So if we multiply the first values by 10,000 and the second ones by 1e7, we should get good ranges of variation.
## [1] 196.0784 370.3704
## [1] 224.7494 656.8576
Indeed, so that’s what we do.
One interesting remark here: closeness radically changes when considering the unweighted and weighted version. Just to confirm this, let us plot the two graphs side by side.
So Thompson (YTH) is has higher closeness centrality than Winnipeg (YWG) when passenger volumes are considered. Does the situation change if we recall that Winnipeg is quite connected to the rest of the world? We prepare the computations and the plots as we had done earlier.
## [1] 0.01960784 0.03571429
## [1] 2.144864e-05 6.247267e-05
Ranges are quite similar to those earlier, so we can probably just do the same as before. We will plot the 4 figures this time. First row is as before, second row is with rest of the world information thrown in.
Actually, things do not change, except that RoW becomes important as well.